103 research outputs found

    Video object segmentation introducing depth and motion information

    Get PDF
    We present a method to estimate the relative depth between objects in scenes of video sequences. The information for the estimation of the relative depth is obtained from the overlapping produced between objects when there is relative motion as well as from motion coherence between neighbouring regions. A relaxation labelling algorithm is used to solve conflicts and assign every region to a depth level. The depth estimation is used in a segmentation scheme which uses grey level information to produce a first segmentation. Regions of this partition are merged on the basis of their depth level.Peer ReviewedPostprint (published version

    Hierarchical morphological segmentation for image sequence coding

    Get PDF
    This paper deals with a hierarchical morphological segmentation algorithm for image sequence coding. Mathematical morphology is very attractive for this purpose because it efficiently deals with geometrical features such as size, shape, contrast, or connectivity that can be considered as segmentation-oriented features. The algorithm follows a top-down procedure. It first takes into account the global information and produces a coarse segmentation, that is, with a small number of regions. Then, the segmentation quality is improved by introducing regions corresponding to more local information. The algorithm, considering sequences as being functions on a 3-D space, directly segments 3-D regions. A 3-D approach is used to get a segmentation that is stable in time and to directly solve the region correspondence problem. Each segmentation stage relies on four basic steps: simplification, marker extraction, decision, and quality estimation. The simplification removes information from the sequence to make it easier to segment. Morphological filters based on partial reconstruction are proven to be very efficient for this purpose, especially in the case of sequences. The marker extraction identifies the presence of homogeneous 3-D regions. It is based on constrained flat region labeling and morphological contrast extraction. The goal of the decision is to precisely locate the contours of regions detected by the marker extraction. This decision is performed by a modified watershed algorithm. Finally, the quality estimation concentrates on the coding residue, all the information about the 3-D regions that have not been properly segmented and therefore coded. The procedure allows the introduction of the texture and contour coding schemes within the segmentation algorithm. The coding residue is transmitted to the next segmentation stage to improve the segmentation and coding quality. Finally, segmentation and coding examples are presented to show the validity and interest of the coding approach.Peer ReviewedPostprint (published version

    Context-unsupervised adversarial network for video sensors

    Get PDF
    This paper is an extended version of our conference paper: Pardàs, M. and Canet, G. Refinement Network for unsupervised on the scene Foreground Segmentation. In Proceedings of the 2020 28th European Signal Processing Conference (EUSIPCO), Amsterdam, The Netherlands , 18–21 January 2021.Foreground object segmentation is a crucial first step for surveillance systems based on networks of video sensors. This problem in the context of dynamic scenes has been widely explored in the last two decades, but it still has open research questions due to challenges such as strong shadows, background clutter and illumination changes. After years of solid work based on statistical background pixel modeling, most current proposals use convolutional neural networks (CNNs) either to model the background or to make the foreground/background decision. Although these new techniques achieve outstanding results, they usually require specific training for each scene, which is unfeasible if we aim at designing software for embedded video systems and smart cameras. Our approach to the problem does not require specific context or scene training, and thus no manual labeling. We propose a network for a refinement step on top of conventional state-of-the-art background subtraction systems. By using a statistical technique to produce a rough mask, we do not need to train the network for each scene. The proposed method can take advantage of the specificity of the classic techniques, while obtaining the highly accurate segmentation that a deep learning system provides. We also show the advantage of using an adversarial network to improve the generalization ability of the network and produce more consistent results than an equivalent non-adversarial network. The results provided were obtained by training the network on a common database, without fine-tuning for specific scenes. Experiments on the unseen part of the CDNet database provided 0.82 a F-score, and 0.87 was achieved for LASIESTA databases, which is a database unrelated to the training one. On this last database, the results outperformed by 8.75% those available in the official table. The results achieved for CDNet are well above those of the methods not based on CNNs, and according to the literature, among the best for the context-unsupervised CNNs systems.This work has been supported by the Spanish Research Agency (AEI) under project PID2020-116907RB-I00.Peer ReviewedPostprint (published version

    3D point cloud video segmentation oriented to the analysis of interactions

    Get PDF
    Given the widespread availability of point cloud data from consumer depth sensors, 3D point cloud segmentation becomes a promising building block for high level applications such as scene understanding and interaction analysis. It benefits from the richer information contained in real world 3D data compared to 2D images. This also implies that the classical color segmentation challenges have shifted to RGBD data, and new challenges have also emerged as the depth information is usually noisy, sparse and unorganized. Meanwhile, the lack of 3D point cloud ground truth labeling also limits the development and comparison among methods in 3D point cloud segmentation. In this paper, we present two contributions: a novel graph based point cloud segmentation method for RGBD stream data with interacting objects and a new ground truth labeling for a previously published data set. This data set focuses on interaction (merge and split between ’object’ point clouds), which differentiates itself from the few existing labeled RGBD data sets which are more oriented to Simultaneous Localization And Mapping (SLAM) tasks. The proposed point cloud segmentation method is evaluated with the 3D point cloud ground truth labeling. Experiments show the promising result of our approach.Postprint (published version

    Refinement network for unsupervised on the scene foreground segmentation

    Get PDF
    Unsupervised learning represents one of the most interesting challenges in computer vision today. The task has an immense practical value with many applications in artificial intelligence and emerging technologies, as large quantities of unlabeled images and videos can be collected at low cost. In this paper, we address the unsupervised learning problem in the context of segmenting the main foreground objects in single images. We propose an unsupervised learning system, which has two pathways, the teacher and the student, respectively. The system is designed to learn over several generations of teachers and students. At every generation the teacher performs unsupervised object discovery in videos or collections of images and an automatic selection module picks up good frame segmentations and passes them to the student pathway for training. At every generation multiple students are trained, with different deep network architectures to ensure a better diversity. The students at one iteration help in training a better selection module, forming together a more powerful teacher pathway at the next iteration. In experiments, we show that the improvement in the selection power, the training of multiple students and the increase in unlabeled data significantly improve segmentation accuracy from one generation to the next. Our method achieves top results on three current datasets for object discovery in video, unsupervised image segmentation and saliency detection. At test time, the proposed system is fast, being one to two orders of magnitude faster than published unsupervised methods. We also test the strength of our unsupervised features within a well known transfer learning setup and achieve competitive performance, proving that our unsupervised approach can be reliably used in a variety of computer vision tasks.During the development of this work the first author was a visitor at TOSHIBA Cambridge Research Lab. This work has been carried out with the support of this lab and project TEC2016-75976-R, by the Ministerio de Economia, Industria y Competitividad and the European Regional Development Fund.Peer ReviewedPostprint (published version

    Segmentation-based video coding:temporals links

    Get PDF
    This paper analyzes the main elements that a segmentation-based video coding approach should be based on so that it can address coding efficiency and content-based functionalities. Such elements can be defined as temporal linking and rate control. The basic features of such elements are discussed and, in both cases, a specific implementation is proposed.Peer ReviewedPostprint (published version

    Temporally coherent 3D point cloud video segmentation in generic scenes

    Get PDF
    © 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Video segmentation is an important building block for high level applications, such as scene understanding and interaction analysis. While outstanding results are achieved in this field by the state-of-the-art learning and model-based methods, they are restricted to certain types of scenes or require a large amount of annotated training data to achieve object segmentation in generic scenes. On the other hand, RGBD data, widely available with the introduction of consumer depth sensors, provide actual world 3D geometry compared with 2D images. The explicit geometry in RGBD data greatly help in computer vision tasks, but the lack of annotations in this type of data may also hinder the extension of learning-based methods to RGBD. In this paper, we present a novel generic segmentation approach for 3D point cloud video (stream data) thoroughly exploiting the explicit geometry in RGBD. Our proposal is only based on low level features, such as connectivity and compactness. We exploit temporal coherence by representing the rough estimation of objects in a single frame with a hierarchical structure and propagating this hierarchy along time. The hierarchical structure provides an efficient way to establish temporal correspondences at different scales of object-connectivity and to temporally manage the splits and merges of objects. This allows updating the segmentation according to the evidence observed in the history. The proposed method is evaluated on several challenging data sets, with promising results for the presented approach.Peer ReviewedPostprint (author's final draft

    Bayesian foreground segmentation and tracking using pixel-wise background model and region-based foreground model

    Get PDF
    In this paper we present a segmentation system for monocular video sequences with static camera that aims at foreground/ background separation and tracking. We propose to combine a simple pixel-wise model for the background with a general purpose region based model for the foreground. The background is modeled using one Gaussian per pixel, thus achieving a precise and easy to update model. The foreground is modeled using a Gaussian Mixture Model with feature vectors consisting of the spatial (x, y) and colour (r, g, b) components. The spatial components of this model are updated using the Expectation Maximization algorithm after the classification of each frame. The background model is formulated in the 5 dimensional feature space in order to be able to apply a Maximum A Posteriori framework for the classification. The classification is done using a graph cut algorithm that allows taking into account neighborhood information. The results presented in the paper show the improvement of the system in situations where the foreground objects have similar colors to those of the background.Peer ReviewedPostprint (published version

    One shot learning for generic instance segmentation in RGBD videos

    Get PDF
    Hand-crafted features employed in classical generic instance segmentation methods have limited discriminative power to distinguish different objects in the scene, while Convolutional Neural Networks (CNNs) based semantic segmentation is restricted to predefined semantics and not aware of object instances. In this paper, we combine the advantages of the two methodologies and apply the combined approach to solve a generic instance segmentation problem in RGBD video sequences. In practice, a classical generic instance segmentation method is employed to initially detect object instances and build temporal correspondences, whereas instance models are trained based on the few detected instance samples via CNNs to generate robust features for instance segmentation. We exploit the idea of one shot learning to deal with the small training sample size problem when training CNNs. Experiment results illustrate the promising performance of the proposed approach.Peer ReviewedPostprint (published version

    A segmentation-based coding system allowing manipulation of objects (sesame)

    Get PDF
    We present a coding scheme that achieves, for each image in the sequence, the best segmentation in terms of rate-distortion theory. It is obtained from a set of initial regions and a set of available coding techniques. The segmentation combines spatial and motion criteria. It selects at each area of the image the most adequate criterion for defining a partition in order to obtain the best compromise between cost and quality. In addition, the proposed scheme is very suitable for addressing content-based functionalities.Peer ReviewedPostprint (published version
    • …
    corecore